Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎯 Reinforcement Learning
Q-learning, Policy Gradient, Reward Functions, TD Learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
122844
posts in
83.5
ms
Online Statistical Inference of
Constant
Sample-averaged
Q-Learning
📊
Optimization
arxiv.org
·
2d
·
…
Hamilton-Jacobi-Bellman
Equation: Reinforcement Learning and Diffusion Models
📊
Dynamic Programming
dani2442.github.io
·
5d
·
Hacker News
·
…
Scalable conflict-free
bandit
algorithm using a quantum
optical
setup
⚛️
Quantum Computing
nature.com
·
1d
·
…
Policywerk
: From Tables to
Imagination
🎲
Deterministic Simulation
dehora.net
·
5d
·
…
Functional
Natural Policy
Gradients
📊
Optimization
arxiv.org
·
2d
·
…
FlowRL
: A
Taxonomy
and Modular Framework for Reinforcement Learning with Diffusion Policies
🌊
Noria
arxiv.org
·
2d
·
…
Reinforcement
learning for quantum
processes
with memory
⚛️
Quantum Computing
arxiv.org
·
6d
·
…
Where-to-Learn:
Analytical
Policy Gradient
Directed
Exploration for On-Policy Robotic Reinforcement Learning
🤖
Robotics
arxiv.org
·
2d
·
…
Target-Aligned
Reinforcement
Learning
📊
Dynamic Programming
arxiv.org
·
1d
·
…
A
Lyapunov
Analysis of
Softmax
Policy Gradient for Stochastic Bandits
📊
Optimization
arxiv.org
·
3d
·
…
A
Pontryagin
Method of Model-based Reinforcement Learning via
Hamiltonian
Actor-Critic
📊
Dynamic Programming
arxiv.org
·
1d
·
…
Empowering
Epidemic
Response: The Role of Reinforcement Learning in
Infectious
Disease Control
📊
Dynamic Programming
arxiv.org
·
3d
·
…
Optimistic Online
LQR
via
Intrinsic
Rewards
📊
Optimization
arxiv.org
·
1d
·
…
An Output Feedback Q-learning Algorithm for Optimal Control of
Nonlinear
Systems with
Koopman
Linear Embedding
📊
Dynamic Programming
arxiv.org
·
1d
·
…
Automatic feature identification in least-squares policy
iteration
using the
Koopman
operator framework
📊
Dynamic Programming
arxiv.org
·
3d
·
…
Sound Value
Iteration
for Simple
Stochastic
Games
📊
Dynamic Programming
arxiv.org
·
2d
·
…
Dynamic
resource
matching
in manufacturing using deep reinforcement learning
⚡
Incremental Computation
arxiv.org
·
2d
·
…
Knowledge
Distillation
for Efficient Transformer-Based Reinforcement Learning in
Hardware-Constrained
Energy Management Systems
🚀
Superoptimization
arxiv.org
·
3d
·
…
Match or Replay: Self
Imitating
Proximal
Policy Optimization
🎲
Deterministic Simulation
arxiv.org
·
2d
·
…
D-SPEAR: Dual-Stream
Prioritized
Experience Adaptive Replay for Stable Reinforcement
Learninging
Robotic Manipulation
📊
Dynamic Programming
arxiv.org
·
2d
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help